AITopics | Cache County

Collaborating Authors

Cache County

cd04ec5aebfbe397c7fd718c35d02e0b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 05:00:32 GMT

artificial intelligence, machine learning, symmetry, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Utah > Cache County > Logan (0.04)
North America > United States > New York > Albany County > Albany (0.04)
Antarctica (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Generalized Proximity Forest

Shaw, Ben, Rustad, Adam, Maia, Sofia Pelagalli, Rhodes, Jake S., Moon, Kevin R.

arXiv.org Machine LearningNov-26-2025

Abstract--Recent work has demonstrated the utility of Random Forest (RF) proximities for various supervised machine learning tasks, including outlier detection, missing data imputation, and visualization. However, the utility of the RF proximities depends upon the success of the RF model, which itself is not the ideal model in all contexts. RF proximities have recently been extended to time series by means of the distance-based Proximity Forest (PF) model, among others, affording time series analysis with the benefits of RF proximities. In this work, we introduce the generalized PF model, thereby extending RF proximities to all contexts in which supervised distance-based machine learning can occur . Additionally, we introduce a variant of the PF model for regression tasks. We also introduce the notion of using the generalized PF model as a meta-learning framework, extending supervised imputation capability to any pre-trained classifier . We experimentally demonstrate the unique advantages of the generalized PF model compared with both the RF model and the k-nearest neighbors model.

imputation, pf model, proximity, (16 more...)

arXiv.org Machine Learning

2511.19487

Country:

North America > United States > Utah > Cache County > Logan (0.14)
North America > United States > Utah > Utah County > Provo (0.05)
Asia > Philippines (0.04)
Antarctica (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

Symmetry Discovery Beyond Affine Transformations

Neural Information Processing SystemsOct-10-2025, 16:52:03 GMT

Symmetry detection can improve various machine learning tasks.

experiment, symmetry, vector field, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Utah > Cache County > Logan (0.04)
North America > United States > New York > Albany County > Albany (0.04)
Antarctica (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Mathematical Theory of Collinearity Effects on Machine Learning Variable Importance Measures

Bladen, Kelvyn K., Cutler, D. Richard, Wisler, Alan

arXiv.org Machine LearningOct-2-2025

In many machine learning problems, understanding variable importance is a central concern. Two common approaches are Permute-and-Predict (PaP), which randomly permutes a feature in a validation set, and Leave-One-Covariate-Out (LOCO), which retrains models after permuting a training feature. Both methods deem a variable important if predictions with the original data substantially outperform those with permutations. In linear regression, empirical studies have linked PaP to regression coefficients and LOCO to $t$-statistics, but a formal theory has been lacking. We derive closed-form expressions for both measures, expressed using square-root transformations. PaP is shown to be proportional to the coefficient and predictor variability: $\text{PaP}_i = β_i \sqrt{2\operatorname{Var}(\mathbf{x}^v_i)}$, while LOCO is proportional to the coefficient but dampened by collinearity (captured by $Δ$): $\text{LOCO}_i = β_i (1 -Δ)\sqrt{1 + c}$. These derivations explain why PaP is largely unaffected by multicollinearity, whereas LOCO is highly sensitive to it. Monte Carlo simulations confirm these findings across varying levels of collinearity. Although derived for linear regression, we also show that these results provide reasonable approximations for models like Random Forests. Overall, this work establishes a theoretical basis for two widely used importance measures, helping analysts understand how they are affected by the true coefficients, dimension, and covariance structure. This work bridges empirical evidence and theory, enhancing the interpretability and application of variable importance measures.

collinearity, loco, theorem 2, (14 more...)

arXiv.org Machine Learning

2510.00557

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
North America > United States > Utah > Cache County > Logan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Label-Guided Imputation via Forest-Based Proximities for Improved Time Series Classification

Rhodes, Jake S., Rustad, Adam G., Maia, Sofia Pelagalli, Thacker, Evan, Choi, Hyunmi, Gutierrez, Jose, Rundek, Tatjana, Shaw, Ben

arXiv.org Machine LearningSep-30-2025

Missing data is a common problem in time series data. Most methods for imputation ignore label information pertaining to the time series even if that information exists. In this paper, we provide a framework for missing data imputation in the context of time series classification, where each time series is associated with a categorical label. We define a means of imputing missing values conditional upon labels, the method being guided by powerful, existing supervised models designed for high accuracy in this task. From each model, we extract a tree-based proximity measure from which imputation can be applied. We show that imputation using this method generally provides richer information leading to higher classification accuracies, despite the imputed values differing from the true values.

classification, imputation, time sery, (15 more...)

arXiv.org Machine Learning

2509.22919

Country:

North America > United States > Utah > Utah County > Provo (0.05)
North America > United States > Utah > Cache County > Logan (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Toward using explainable data-driven surrogate models for treating performance-based seismic design as an inverse engineering problem

Esteghamati, Mohsen Zaker

arXiv.org Machine LearningAug-4-2025

This study presents a methodology to treat performance-based seismic design as an inverse engineering problem, where design parameters are directly derived to achieve specific performance objectives. By implementing explainable machine learning models, this methodology directly maps design variables and performance metrics, tackling computational inefficiencies of performance-based design. The resultant machine learning model is integrated as an evaluation function into a genetic optimization algorithm to solve the inverse problem. The developed methodology is then applied to two different inventories of steel and concrete moment frames in Los Angeles and Charleston to obtain sectional properties of frame members that minimize expected annualized seismic loss in terms of repair costs. The results show high accuracy of the surrogate models (e.g., R2> 90%) across a diverse set of building types, geometries, seismic design, and site hazard, where the optimization algorithm could identify the optimum values of members' properties for a fixed set of geometric variables, consistent with engineering principles.

artificial intelligence, machine learning, surrogate model, (20 more...)

arXiv.org Machine Learning

doi: 10.1098/rsta.2024.0050

2508.00286

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.34)
North America > United States > Utah > Cache County > Logan (0.04)
North America > United States > South Carolina > Charleston County > Charleston (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Materials > Construction Materials (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities

Nordby, Ross

arXiv.org Artificial IntelligenceMay-22-2025

To help evaluate and understand the latent capabilities of language models, this paper introduces an approach using optimized input embeddings, or 'soft prompts,' as a metric of conditional distance between a model and a target behavior. The technique aims to facilitate latent capability discovery as a part of automated red teaming/evaluation suites and to provide quantitative feedback about the accessibility of potentially concerning behaviors in a way that may scale to powerful future models, including those which may otherwise be capable of deceptive alignment. An evaluation framework using soft prompts is demonstrated in natural language, chess, and pathfinding, and the technique is extended with generalized conditional soft prompts to aid in constructing task evaluations.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.14943

Country:

Africa (0.04)
North America > United States > Utah > Cache County > Logan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Education (0.68)
Leisure & Entertainment > Games > Chess (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Planning, scheduling, and execution on the Moon: the CADRE technology demonstration mission

Rabideau, Gregg, Russino, Joseph, Branch, Andrew, Dhamani, Nihal, Vaquero, Tiago Stegun, Chien, Steve, de la Croix, Jean-Pierre, Rossi, Federico

arXiv.org Artificial IntelligenceFeb-20-2025

NASA's Cooperative Autonomous Distributed Robotic Exploration (CADRE) mission, slated for flight to the Moon's Reiner Gamma region in 2025/2026, is designed to demonstrate multi-agent autonomous exploration of the Lunar surface and sub-surface. A team of three robots and a base station will autonomously explore a region near the lander, collecting the data required for 3D reconstruction of the surface with no human input; and then autonomously perform distributed sensing with multi-static ground penetrating radars (GPR), driving in formation while performing coordinated radar soundings to create a map of the subsurface. At the core of CADRE's software architecture is a novel autonomous, distributed planning, scheduling, and execution (PS&E) system. The system coordinates the robots' activities, planning and executing tasks that require multiple robots' participation while ensuring that each individual robot's thermal and power resources stay within prescribed bounds, and respecting ground-prescribed sleep-wake cycles. The system uses a centralized-planning, distributed-execution paradigm, and a leader election mechanism ensures robustness to failures of individual agents. In this paper, we describe the architecture of CADRE's PS&E system; discuss its design rationale; and report on verification and validation (V&V) testing of the system on CADRE's hardware in preparation for deployment on the Moon.

agent, constraint, rover, (15 more...)

arXiv.org Artificial Intelligence

2502.14803

Country:

North America > United States > California > Los Angeles County > Pasadena (0.05)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry:

Energy (0.93)
Government > Space Agency (0.91)
Government > Regional Government > North America Government > United States Government (0.91)
Telecommunications (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

BadSAD: Clean-Label Backdoor Attacks against Deep Semi-Supervised Anomaly Detection

Cheng, He, Xu, Depeng, Yuan, Shuhan

arXiv.org Artificial IntelligenceDec-17-2024

Image anomaly detection (IAD) is essential in applications such as industrial inspection, medical imaging, and security. Despite the progress achieved with deep learning models like Deep Semi-Supervised Anomaly Detection (DeepSAD), these models remain susceptible to backdoor attacks, presenting significant security challenges. In this paper, we introduce BadSAD, a novel backdoor attack framework specifically designed to target DeepSAD models. Our approach involves two key phases: trigger injection, where subtle triggers are embedded into normal images, and latent space manipulation, which positions and clusters the poisoned images near normal images to make the triggers appear benign. Extensive experiments on benchmark datasets validate the effectiveness of our attack strategy, highlighting the severe risks that backdoor attacks pose to deep learning-based anomaly detection systems.

abnormal image, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.13324

Country:

North America > United States > Utah > Cache County > Logan (0.04)
North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)
Asia (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SiReRAG: Indexing Similar and Related Information for Multihop Reasoning

Zhang, Nan, Choubey, Prafulla Kumar, Fabbri, Alexander, Bernadett-Shapiro, Gabriel, Zhang, Rui, Mitra, Prasenjit, Xiong, Caiming, Wu, Chien-Sheng

arXiv.org Artificial IntelligenceDec-8-2024

Indexing is an important step towards strong performance in retrieval-augmented generation (RAG) systems. However, existing methods organize data based on either semantic similarity (similarity) or related information (relatedness), but do not cover both perspectives comprehensively. Our analysis reveals that modeling only one perspective results in insufficient knowledge synthesis, leading to suboptimal performance on complex tasks requiring multihop reasoning. In this paper, we propose SiReRAG, a novel RAG indexing approach that explicitly considers both similar and related information. On the similarity side, we follow existing work and explore some variances to construct a similarity tree based on recursive summarization. On the relatedness side, SiReRAG extracts propositions and entities from texts, groups propositions via shared entities, and generates recursive summaries to construct a relatedness tree. We index and flatten both similarity and relatedness trees into a unified retrieval pool. Our experiments demonstrate that SiReRAG consistently outperforms state-of-the-art indexing methods on three multihop datasets (MuSiQue, 2WikiMultiHopQA, and HotpotQA), with an average 1.9% improvement in F1 scores. As a reasonably efficient solution, SiReRAG enhances existing reranking methods significantly, with up to 7.8% improvement in average F1 scores.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.06206

Country:

North America > United States > California (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Washington > Walla Walla County > Walla Walla (0.04)
(13 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback